NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Robust Learning from Noisily Labeled Long-Tailed Data via Fairness Regularizer

Wei, Jiaheng; Zhu, Zhaowei; Niu, Gang; Liu, Tongliang; Liu, Sijia; Sugiyama, Masashi; Liu, Yang (January 2026, AAAI 2026)

Both long-tailed and noisily labeled data frequently appear in real-world applications and impose significant challenges for learning. Most prior works treat either problem in an isolated way and do not explicitly consider the coupling effects of the two. Our empirical observation reveals that such solutions fail to consistently improve the learning when the dataset is long-tailed with label noise. Moreover, with the presence of label noise, existing methods do not observe universal improvements across different sub-populations; in other words, some sub-populations enjoyed the benefits of improved accuracy at the cost of hurting others. Based on these observations, we introduce the Fairness Regularizer (FR), inspired by regularizing the performance gap between any two sub-populations. We show that the introduced fairness regularizer improves the performances of sub-populations on the tail and the overall learning performance. Extensive experiments demonstrate the effectiveness of the proposed solution when complemented with certain existing popular robust or class-balanced methods.
more » « less
Full Text Available
PiCO+: Contrastive Label Disambiguation for Robust Partial Label Learning

https://doi.org/10.1109/TPAMI.2023.3342650

Wang, Haobo; Xiao, Ruixuan; Li, Yixuan; Feng, Lei; Niu, Gang; Chen, Gang; Zhao, Junbo (May 2024, IEEE Transactions on Pattern Analysis and Machine Intelligence)

Full Text Available
Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations

Wei, Jiaheng; Zhu, Zhaowei; Cheng, Hao; Liu, Tongliang; Niu, Gang; Liu, Yang (January 2022, International Conference on Learning Representations)

Existing research on learning with noisy labels mainly focuses on synthetic label noise. Synthetic label noise, though has clean structures which greatly enable statistical analyses, often fails to model the real-world noise patterns. The recent literature has observed several efforts to offer real-world noisy datasets, e.g., Food-101N, WebVision, and Clothing1M. Yet the existing efforts suffer from two caveats: firstly, the lack of ground-truth verification makes it hard to theoretically study the property and treatment of real-world label noise. Secondly, these efforts are often of large scales, which may result in unfair comparisons of robust methods within reasonable and accessible computation power. To better understand real-world label noise, it is important to establish controllable and moderate-sized real-world noisy datasets with both ground-truth and noisy labels. This work presents two new benchmark datasets, which we name as CIFAR-10N, CIFAR-100N, equipping the training datasets of CIFAR-10 and CIFAR-100 with human-annotated real-world noisy labels that we collect from Amazon Mechanical Turk. We quantitatively and qualitatively show that real-world noisy labels follow an instance-dependent pattern rather than the classically assumed and adopted ones (e.g., class-dependent label noise). We then initiate an effort to benchmark a subset of the existing solutions using CIFAR-10N and CIFAR-100N. We further proceed to study the memorization of correct and wrong predictions, which further illustrates the difference between human noise and class-dependent synthetic noise. We show indeed the real-world noise patterns impose new and outstanding challenges as compared to synthetic label noise. These observations require us to rethink the treatment of noisy labels, and we hope the availability of these two datasets would facilitate the development and evaluation of future learning with noisy label solutions. The corresponding datasets and the leaderboard are publicly available at http://noisylabels.com.
more » « less
Full Text Available
To Smooth or Not? When Label Smoothing Meets Noisy Labels

Wei, Jiaheng; Liu, Hangyu; Liu, Tongliang; Niu, Gang; Sugiyama, Masashi; Liu, Yang (January 2022, International Conference on Machine Learning)

Label smoothing (LS) is an arising learning paradigm that uses the positively weighted average of both the hard training labels and uniformly distributed soft labels. It was shown that LS serves as a regularizer for training data with hard labels and therefore improves the generalization of the model. Later it was reported LS even helps with improving robustness when learning with noisy labels. However, we observed that the advantage of LS vanishes when we operate in a high label noise regime. Intuitively speaking, this is due to the increased entropy of ℙ(noisy label|X) when the noise rate is high, in which case, further applying LS tends to "over-smooth" the estimated posterior. We proceeded to discover that several learning-with-noisy-labels solutions in the literature instead relate more closely to negative/not label smoothing (NLS), which acts counter to LS and defines as using a negative weight to combine the hard and soft labels! We provide understandings for the properties of LS and NLS when learning with noisy labels. Among other established properties, we theoretically show NLS is considered more beneficial when the label noise rates are high. We provide extensive experimental results on multiple benchmarks to support our findings too.
more » « less
Full Text Available
Estimating Instance-dependent Label-noise Transition Matrix using a Deep Neural Network

Yang, Shuo; Yang, Erkun; Han, Bo; Liu, Yang; Xu, Min; Niu, Gang; Liu, Tongliang (January 2022, International Conference on Machine Learning)

In label-noise learning, estimating the transition matrix is a hot topic as the matrix plays an important role in building statistically consistent classifiers. Traditionally, the transition from clean labels to noisy labels (i.e., clean-label transition matrix (CLTM)) has been widely exploited to learn a clean label classifier by employing the noisy data. Motivated by that classifiers mostly output Bayes optimal labels for prediction, in this paper, we study to directly model the transition from Bayes optimal labels to noisy labels (i.e., Bayes-label transition matrix (BLTM)) and learn a classifier to predict Bayes optimal labels. Note that given only noisy data, it is ill-posed to estimate either the CLTM or the BLTM. But favorably, Bayes optimal labels have less uncertainty compared with the clean labels, i.e., the class posteriors of Bayes optimal labels are one-hot vectors while those of clean labels are not. This enables two advantages to estimate the BLTM, i.e., (a) a set of examples with theoretically guaranteed Bayes optimal labels can be collected out of noisy data; (b) the feasible solution space is much smaller. By exploiting the advantages, we estimate the BLTM parametrically by employing a deep neural network, leading to better generalization and superior classification performance.
more » « less
Full Text Available
Instance-dependent Label-noise Learning under a Structural Causal Model

Yao, Yu; Liu, Tongliang; Gong, Mingming; Han, Bo; Niu, Gang; Zhang, Kun (January 2021, Advances in Neural Information Processing Systems)
Ranzato, M.; Beygelzimer, A.; Dauphin, Y.; Liang, P.S.; Vaughan, J. Wortman (Ed.)
Full Text Available
FgPal1 regulates morphogenesis and pathogenesis in Fusarium graminearum

https://doi.org/10.1111/1462-2920.15266

Yin, Jinrong; Hao, Chaofeng; Niu, Gang; Wang, Wei; Wang, Guanghui; Xiang, Ping; Xu, Jin‐Rong; Zhang, Xue (December 2020, Environmental Microbiology)
null (Ed.)
Full Text Available

Search for: All records